High performance computational biology and drug design on TianHe Supercomputers

نویسنده

  • Shaoliang Peng
چکیده

Extremely powerful computers are needed to help scientists to handle high performance computational biology and drug design problems. The world’s largest genomics institute BGI currently generates 6 TB data each day. The European Bioinformatics Institute (EBI) in Hinxton currently stores 20 petabytes (1 petabyte is 1015 bytes) of data and back-ups about genes, proteins and small molecules. TianHe supercomputers can speed up computational biology and drug design processing. In 2013, 2014, and 2015, Tianhe-2 topped the TOP500 list of fastest supercomputers in the world. Many well-known bioinformatics and drug design softwares (BWA, DOCK, SOAP3-dp, SOAPdenovo, SOAPsnp etc.) are developed and running on TH-2. The talk focuses on two main areas: 1. Drug Design: mD3DOCKxb is a largest high throughput molecular docking platform and finishes the docking of all the purchasable molecules (about 42 million) on earth within 24 hours.It has a parallel efficiency of over 70% using 192,000 CPU cores and 1,368,000 MIC cores. It gains the Gold Award of PAC 2015 (Parallel Application Challenge Competition) and is reported by CCTV 1, ScienceNet, China Science and Technology News, and 2015 Top 10 News of Hunan Province of China. 2. Genetic Engineering: The “Human Whole Genome Re-sequencing Analysis Software Pipeline” is firstly designed by applicant. The whole analyzing procedure takes 4 hours to finish the analysis of a 300 TB dataset of whole genome sequences from 2,000 human beings. The speedup is about 1200X. TianHe Supercomputers can handle 3 kinds of computational biology and drug design problems: computation intensive, memory intensive, and communication intensive. In future, TH-2 will be open online to all the scientists not only in China but also all over the world.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Solving global shallow water equations on heterogeneous supercomputers

The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale th...

متن کامل

Scaling up Hartree-Fock calculations on Tianhe-2

This paper presents a new optimized and scalable code for Hartree–Fock self-consistent field iterations. Goals of the code design include scalability to large numbers of nodes, and the capability to simultaneously use CPUs and Intel Xeon Phi coprocessors. Issues we encountered as we optimized and scaled up the code on Tianhe-2 are described and addressed. A major issue is load balance, which is...

متن کامل

An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.

Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, ...

متن کامل

DDC: Distributed Data Collection Framework for Failure Prediction in Tianhe Supercomputers

Reliability has become an issue to the Tianhe supercomputer series with the scaling of the system. Proactive fault-tolerance based on failure prediction turns into an effective way to improve the system’s fault tolerance ability. Data collection is the basis of the failure prediction which has a great impact on the prediction accuracy, while current data collection methods for failure predictio...

متن کامل

Parallel Sequence Comparison and Alignment

Sequence comparison, a vital research tool in computational biology, is based on a simple O(n) algorithm that easily maps to a linear array of processors. This paper reviews and compares high-performance sequence analysis on general-purpose supercomputers and single-purpose, recon gurable, and programmable co-processors. The di culty of comparing hardware from published performance gures is als...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016